AITopics | variable label

Collaborating Authors

variable label

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Natural Language Processing Approach to Support Biomedical Data Harmonization: Leveraging Large Language Models

Li, Zexu, Prabhu, Suraj P., Popp, Zachary T., Jain, Shubhi S., Balakundi, Vijetha, Ang, Ting Fang Alvin, Au, Rhoda, Chen, Jinying

arXiv.org Artificial IntelligenceNov-4-2024

Biomedical research requires large, diverse samples to produce unbiased results. Automated methods for matching variables across datasets can accelerate this process. Research in this area has been limited, primarily focusing on lexical matching and ontology based semantic matching. We aimed to develop new methods, leveraging large language models (LLM) and ensemble learning, to automate variable matching. Methods: We utilized data from two GERAS cohort (European and Japan) studies to develop variable matching methods. We first manually created a dataset by matching 352 EU variables with 1322 candidate JP variables, where matched variable pairs were positive and unmatched pairs were negative instances. Using this dataset, we developed and evaluated two types of natural language processing (NLP) methods, which matched variables based on variable labels and definitions from data dictionaries: (1) LLM-based and (2) fuzzy matching. We then developed an ensemble-learning method, using the Random Forest model, to integrate individual NLP methods. RF was trained and evaluated on 50 trials. Each trial had a random split (4:1) of training and test sets, with the model's hyperparameters optimized through cross-validation on the training set. For each EU variable, 1322 candidate JP variables were ranked based on NLP-derived similarity scores or RF's probability scores, denoting their likelihood to match the EU variable. Ranking performance was measured by top-n hit ratio (HRn) and mean reciprocal rank (MRR). Results:E5 performed best among individual methods, achieving 0.90 HR-30 and 0.70 MRR. RF performed better than E5 on all metrics over 50 trials (P less than 0.001) and achieved an average HR 30 of 0.98 and MRR of 0.73. LLM-derived features contributed most to RF's performance. One major cause of errors in automatic variable matching was ambiguous variable definitions within data dictionaries.

derivation rule, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2411.0273

Country:

Asia > Japan (0.25)
North America > United States > Massachusetts > Suffolk County > Boston (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.70)
Health & Medicine > Epidemiology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The 5 Pitfalls of Document Labeling -- And How to Avoid Them -- TagWorks

#artificialintelligenceOct-4-2019, 23:10:03 GMT

Don't let your annotation project bury you. Whether you call it "content analysis," "textual data labeling," "hand-coding," or "tagging," a lot more researchers and data science teams are starting up annotation projects these days. Many want human judgment labeled onto text so they train AI (via supervised machine learning approaches). Others have tried automated text analysis and found it wanting. Now they're looking for ways to label text that aren't so hard to interpret and explain.

annotation project, annotator, pitfall, (12 more...)

#artificialintelligence

Country:

North America > United States > California > Alameda County > Oakland (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.35)

Add feedback

The five pitfalls of document labeling - and how to avoid them -- SAGE Ocean Big Data, New Tech, Social Science

#artificialintelligenceSep-10-2019, 23:07:28 GMT

Whether you call it'content analysis', 'textual data labeling', 'hand-coding', or'tagging', a lot more researchers and data science teams are starting up annotation projects these days. Many want human judgment labeled onto text to train AI (via supervised machine learning approaches). Others have tried automated text analysis and found it wanting. Now they're looking for ways to label text that aren't so hard to interpret and explain. Some just want what social scientists have always wanted: a way to analyze massive archives of human behavior (like the Supreme Court's transcripts or diplomatic correspondence) at high scales.

annotation project, sage ocean big data, variable label, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.98)
Information Technology > Data Science > Data Mining > Big Data (0.40)

Add feedback